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Abstract 

Recent daily data of the Southern Oscillation Index have been analyzed. The power 
spectrum indicates major intrinsic geophysical short periods. We find interesting 
"high frequency" oscillations at 24, 27, 37, 76, 100 and 365 days. In particular the 
24 days peaks may correspond to the Branstator-Kushnir wave, the 27 days may be 
due to the moon effect rotation, the 37 days peaks is most probably related to the 
Madden and Julian Oscillation. It is not yet clear the explanations for the 76 days 
which may be associated with interseasonal oscillation in the tropical atmosphere; 
the 100 days could be resulting from a mere beat between the 37 and 27 periods, 
or the 76 and 365 days. Next these periods are used to reconstruct the signal and 
to produce a forecast for the next 9 months, at the time of writing. After cleansing 
the signal of those periodicities a detrended fluctuation analysis is performed to 
reveal the nature of the stochastic structures in the signal and whether specific 
correlation can be found. We study the evolution of the distribution of first return 
times, in particular between extreme events. A markedly significant difference from 
the expected distribution for uncorrelated events is found. 
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1 Introduction 



One of the most intriguing phenomena in climatology is known as El Nino /Southern 
Oscillation (ENSO), i.e. the more or less cyclic warming and cooling, mainly 
seen through an oscillation of about 6 — 8 years, of the eastern and central 
regions in the Pacific Ocean. El Nino is considered to be due to a disruption 
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of the ocean-atmospheric system in the tropical Pacific and is factually de- 
scribed by the so called Southern Oscillation Index (SOI) [3, 0, S, 0] , a proxy 
measure based on surface air pressure differences between Darwin, Australia 
and Tahiti, French Polynesia (if. Data are usually analyzed from monthly 
averages. 

Sustained negative values of the SOI often indicate El Nino episodes (see 
://www. bom. gov.au/climate/glossary/soi.shtml). These negative values are 
usually accompanied by sustained warming of the central and eastern tropi- 
cal Pacific Ocean, a decrease in the strength of the Pacific Trade Winds, and 
a reduction in rainfall over eastern and northern Australia. The most recent 
strong El Nino was in 1997/98. Positive values of the SOI are associated with 
stronger Pacific trade winds and warmer sea temperatures to the north of 
Australia, popularly known as a La Nina episode. Waters in the central and 
eastern tropical Pacific Ocean become cooler during this time. Together these 
give an increased probability that eastern and northern Australia will be wet- 
ter than normal. The most recent strong La Nina was in 1988/89; a moderate 
La Nina event occurred in 1998/99, which weakened back to neutral conditions 
before reforming for a shorter period in 1999/2000. This last event finished in 
Autumn 2000. 

In our opinion wide window filtering and averaging techniques are nowadays 
unnecessary to obtain interesting informations. In fact, one of the most impor- 
tant problems in quantitative weather forecasting is to understand the nature 
(or structure) of stochastic processes which underline the weather evolution. 
One aspect of interest is to determine characteristics of the fluctuation distri- 
bution in a signal p], leading to a probability distribution function (PDF) and 
their correlations. Empirical studies found that they are not (always) Gaus- 
sian distributed |7j]. It was recently found e.g. for the southern oscillation index 
(SOI) [7] that long-range correlations may exist between the fluctuations of 
the index (even if it is still matter of debate (§]; moreover the PDF's have 
so called heavy tails ([§]), both features being describable through a Fokker- 
Planck equation approach. For such highly non linear systems higher frequency 
inputs of initial conditions in numerical simulations should be clearly helpful. 
Computer time would also be reduced if scaling laws are found or known, and 
propose criteria/constraints in iterating models. 

Here below we report results of analysis of daily data downloaded from the 
Long Paddock - Climate Management Information for Rural Australia web site 
http://www. longpaddock. qld.gov. au/ SeasonalClimateOutlook/ SouthernOscil- 
lationlndex/ SOIDataFiles/ index, html for the time interval between Jan. 1999 
and March 2007. We look for intrinsic mode periodicity and short/long term 
correlations present in the daily time series. We find interesting "high fre- 
quency" oscillations at 24, 27, 37, 76, 100 and 365 days. For correlation studies 
we use the detrended fluctuation analysis (DFA) method, equivalent to finding 
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the Hurst exponent [1 



11 



, on the cleansed signal (to be defined later). 



Whence we reconstruct the signal, extrapolate for forecasting-like purpose for 
the next 9 months, at the time of writing. The presently objective observations 
seem to give some good agreement. We measure the so called area under the 
receiver operating curve as a test of forecasting quality [12[ Il3|. We get a 0.73 
value for the value which is usually considered quite good. 



Moreover considering the signal as a random walk we can estimate the law of 
"first returns" for barrier level crossing, giving some quantitative information 
on the distribution time intervals even between extreme events. A markedly 
significant difference from the expected distribution for uncorrelated events is 
found. The dynamics of ENSO being sufficiently well represented in the daily 
data, we concur that a daily time series may be quite usefully employed for 
predicting events. 



2 Data 



Daily values of the SOI index for the time interval between Jan. 1999 and 
March. 2007 were downloaded from the Long Paddock - Climate Management 
Information for Rural Australia web site http://www.longpaddock.qld.gov.au/ 
SeasonalClimateOutlook/ SouthernOscillationlndex/ SOIDataFiles/ index.html 
for the longest period available at the time of finishing this study, (the web- 
site is updated daily). The raw data series is normalized through the standard 
deviation S of the sea level pressure (SLP) at a station in Tahiti and the sea 
level pressure at a station in Darwin. 
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The data set consists of 3006 data points. The SOI daily data are plotted as 
a function of time in Fig. [TJ The mean and standard deviation of this time 
series are respectively [i = —0.05 and a = 1.57. 

It is sometimes stated that daily or weekly values of the SOI do not convey 
much in the way of useful information about the current state of the climate, 
and accordingly the Bureau of Meteorology does not issue them. Daily values 
in particular can fluctuate markedly because of daily weather patterns, and 
should not be used for climate purposes. We may disagree with this statement. 
There are indeed techniques which can sort out noise from coherent behavior 
and conversely. This shows the intent of using daily data. 
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3 Power spectrum 



Instead of empirically searching for modes 14| we directly estimate the in- 
trinsic fast (intra annual) modes from the power spectrum (PS) of the daily 
data following the algorithm proposed by [l5|. The PS is estimated using the 
Multitaper Method (MTM) [16l. Il7j] with K = 5 tapers. We chose this method 



because it is non parametric and reduces the variance of spectral estimates by 
using a small set of tapers Il7l |. The PS obtained through MTM is shown 
in Fig. [2k on a linear-log plot. The PS of daily SOI is tested for significance 
relative to the null hypothesis of red noise (Fig. [2k) and locally white noise 
(Fig. [3k)- A robust estimate of the red noise spectral background is found, 



following the work by 15], by minimizing the misfit between an analytical 



AR(1) (Auto Regressive process of order 1) red noise spectrum given by 



S(f) = S 



1-r 2 



1 - 2rcos(27r f/f N ) +r 2 



(2) 



and the adaptively weighted multitaper spectrum which has been firstly con- 
volved with a median smoother to reduce the weight of outliers in the least 
square fit. In equation (j2J) So is the average value of the power spectrum, r is 
the lag-one autocorrelation and Jn = 2/ At is the Nyquist frequency (At = 1 
day), the highest frequency that can be resolved with a sampling rate At. 
In the robust estimation So and r are considered as free parameters for the 
minimization procedure. The PS is also tested against a locally white noise 
hypothesis (Fig. [3k)- 

The 95% and 99% confidence level (showed in the plots) for peaks detection 
are determined from the appropriate quantile of the x-square distribution with 
v = 2K degrees of freedom. From the plots we can say that the red noise back- 
ground hypothesis is appropriate only for the first part of the spectrum (small 
frequency). In the frequency range [0; 0.1]Hz both hypotheses on background 
noise give similar results on the number and positions of peaks exceeding the 
99% confidence intervals. Quite different are instead the results for higher fre- 
quencies for which the spectrum of the SOI signal does not seem to be well 
fitted by a red noise spectrum. 

From figures [2b and [3b one can recognize the position of several peaks at 
specific days (or for well defined periods) in the power spectrum. The positions 
of the highest peaks exceeding the 99% confidence interval are found to be 
roughly at 24, 27, 37, 76, 100 and 365 days; we did not consider those peaks 
which had small power even if exceeding the 99% confidence level, i.e. for 
frequencies greater than 0.1 cycles/days. Moreover we realize that each is an 
integer value, which is a first rough approximation with respect to the true 
geophysical period representing some event. Yet, some of these periodicities 
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may be related to well known geophysical processes 18j. In particular the 
24 days peaks may correspond to the Branstator-Kushnir wave, a westward 
travelling wave with a period of about 23 days B 0, HHH, the 27 days 
may be due to the moon effect rotation, the 37 days peaks is most probably 
related to the extratropical wave, so called Madden and Julian Oscillation 
(MJO) 24|, |25j. It is not yet clear which explanations can be provided for the 
76 days which may be associated with interseasonal oscillation in the tropical 
atmosphere, as discussed by 



26 



analyzing monthly SOI data; the 100 days 
could be resulting from a mere beat between the 37 and 27 periods, or the 76 
and 365 days, which give respectively a 100 and a 96 day beating period, - or 
both jointly; the 365 days periodicity nature is obvious for those who regularly 
turn around the sun. 



3. 1 Signal reconstruction and forecasting 



The results of the previous section can be used to reconstruct and forecast the 
future behavior of the daily SOI signal. We assume that the SOI signal can 
be modeled in time by the following stochastic process 

SOI(t) = A 1 (t) + £ Ai(t) cos A + 0,) + e(t) (3) 



where e(t) is a noise component, the nature of which should be determined, 
and Ti are the periods obtained in the previous analysis. The values for Ai(t) 
and (f>i can be obtained from a time domain inversion of the spectral domain 



information contained in the K eigentapers [27|, |28| but we prefer to model 
Ai(t) as polynomial functions of degree n (in our case we consider n — 1, i.e. 
Ai(t) = Ai + Btf) and to obtain the free parameters from a least square fitting 
procedure. In Fig. [5] we show the signal (dashed line) and the reconstructing 
function (solid line). The periods used for reconstruction are the following (the 
parameters Ai, Bi and 6i obtained by the best fit procedure are summarized 
in Table 1): 70 months [9J, 365 days, 100 days, 76 days, 37 days, 27 days and 
24 days. 

The function so obtained is then removed from the original SOI time series. A 
cleansed signal is obtained and the MTM spectrum is estimated again. In Fig. 
[5] we show a comparison of the power spectrum before and after removing the 
harmonic components from the daily SOI. One can recognize the portion of the 
spectrum due to the harmonic components. Both spectra are compared with 
the 95% and 99% confidence levels as described above for the red noise hypoth- 
esis. Recall that one relevant erroneous ingredient is the assumption of integer 
values for the considered geophysical periods. The error may accumulate for 
moderately short time series, but this assumption is hardly unavoidable. 
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In Fig. [6] we show the same reconstruction function from 01 Jan. 2006 and a 
possible forecast, based on this function, for the SOI for the next year (after 
March 2007 until 01 Jan. 2008). This function is shown together with a 95% 
confidence interval obtained through a cross validation procedure where each 
new data point is predicted using a model fitted only on the previous data 
points (so called "leave one out" validation). 

In order to give another evaluation of the performance of our forecasting algo- 
rithm in predicting positive and negative SOI values we have also estimated 
the receiver operating curve (ROC) (plot in Fig. [7]). The ROC [l2j, [l3| is a 
widely used method to asses the validity of a binary classifier through the 
sensitivity and specificity as its discrimination threshold is varied. It is ob- 
tained as follows: in the first step SOI is digitalized based on the sign, the 
forecasted time series is similarly translated according to the signs (- and +). 
We then verify how good the forecast is by estimating the true positive and 
false positive rates (as shown in Figure [7]), in the whole time window, varying 
the threshold used to discriminate between positive and negative values. The 
last step consists in estimating the area under the ROC, often called ROC 
AUC for which we obtained a value of 0.73. 



4 Time correlations 



Thereafter we can look for correlations in the cleansed signals, in which the 
main trends have been removed, through some estimator. This is important to 
understand the nature of the stochastic component in equation ([3]). Different 
estimators can be found in the literature for the long and/or short range 
dependence of fluctuations correlations 29, 3(|, one of the most precise and /or 
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common is the detrended fluctuation analysis (DFA) method, see e.g. 
321 . |33| . The method has been used previously to identify whether long range 
correlations exist in non-st at ionary signals , in many research fields such as e.g. 
finance 



cardiac dynamics [36| and of course meteorology [37], |38|, [39j? 
and by so many others that no exhaustive list can be here given. For an 

Briefly, we recall that the signal time series 



34|,|35j 



extensive list of references see 33 



y(t) is first integrated, to 'mimic' a random walk Y(t). The time axis (from 
1 to N) is next divided into non-overlapping boxes of equal size n; one looks 
thereafter for the best (polynomial, of degree m) trend, z n>m , in each box, 
and calculates the root mean square deviation of the (integrated) signal with 
respect to z njTn in each box. The average of such values is taken at fixed box 
size n in order to obtain 



n 



\ 



N 



i=l 



(4) 
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The box size is next varied over all possible n values and F m (n) recalculated. 
The resulting function is expected to behave like n a indicating a scaling law. 
The value of a can be related to the fractal dimension and/or the Hurst 



exponent of the signal [11 



In Fig.[8]we show a detrended fluctuation analysis of the cleansed daily SOI for 
linear (m = 1, DFAl) and quadratic (m = 2, DFA2) detrending. The resulting 
F m (n) functions can be usefully compared, from our point of view, with DFAl 
of a synthetically generated Brownian motion with the same number of data 
points. It can be noticed looking at Fig [8] that F m {n) for the daily SOI, for 
both detrending, does not show a linear behavior for the whole range of n in 
the log-log plot, while the Brownian motion does. A not linear F m (n) function 
for the whole range of n may be due to the process not being long memory 



or to trends |33| still present in the cleansed data and which have not been 
removed from the algorithm used here, an open question therefore remains. 
Another method to asses long memory in stochastic processes is through the 
PS |4l|. We then show the adaptively weighted multitaper spectrum (S(f)) 
of the cleansed series (Fig. [HD in a log- log plot. Also in this linear 
behavior, which would imply S(f) ~ / , is not found in the whole frequency 
range indicating no simple self-affine behavior. 



5 Return times 



Another interesting point is the statistics of extreme (rare) events, namely 
those events that exceed a given threshold. Following previous studies we con- 
sidered the cleansed SOI signal as the analo g of a complex random walk and 



examine the distribution of first return times 42 . We studied the time interval 



r between two consecutive threshold crossing. The threshold q measures the 
strength of an event. The time series is firstly normalized such to have vari- 
ance equal to 1, then a threshold q — 1 is chosen. We chose the same threshold 
for positive and negative "extreme" values. The results in Fig. [10] show the 
distributions for the time interval between consecutive positive, negative and 
both positive and negative threshold crossing. The return time interval dis- 
tributions for cleansed SOI are compared with those of shuffled SOI. It is 
expected 43|, |44j that while the latter should follow a Poisson distribution, as 
for uncorrelated events, long range correlated process should follow a stretched 
exponential distribution 43|, [44|. The two kind of experimental distributions 
have been fitted with a Poisson distribution, for the shuffled SOI, and with a 
stretched exponential distribution, defined as 



P(r) 



u-b(r/Rr 
R 



(5) 
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for the cleansed SOI. In equation ((Sj) R is the mean value of r, a, b and 7 are 
free parameters. The exponent 7, for a long memory process x, is the same 
exponent that characterizes the autocorrelation function at lag s, C x (s) =< 
XiXi +s >~ s~ 7 . Figure [TU] shows that while the shuffled data are well fitted by 
a Poisson distribution the cleansed SOI does not follow a Poisson distribution 
and it is closer to a stretched exponential one even if, due to the shortage 
of data, this result cannot yet be asserted with a good statistical significance 
test. 



6 Conclusion 



In contrast with many other works on SOI we have considered daily fluctua- 
tions instead of the more often used monthly averaged data. We have searched 
therefore for high frequency features, and observed intra annual major periods 
: 24, 27, 37, 76, 100 and 365 days. At periods shorter than 100 days, there is 
known evidence for very energetic westward propagating sea level signals at 
low latitudes (equatorward of about 20 deg latitude). The nature of this in- 
traseasonal variability is usually thought to be distinctly different for periods 
longer and shorter than ca. 50 days. We have pointed out that each period 
is an integer value, which is a first rough approximation with respect to the 
true geophysical period representing some event. Yet, some of these period- 
icities may be related to well known geophysical processes. The 24 days may 
correspond to the Branstator-Kushnir wave, a westward travelling wave with 
a period of about 23 days 19, 2(3, 21, 22, 23], the 27 days may be tides due to 



the moon, the 37 days peaks is most probably related to the Madden - Julian 



Oscillation [24J, [25j. It is not yet clear which explanations can be provided for 
the 76 days which may be associated with interseasonal oscillation in the trop- 
ical atmosphere 261 ]; the 100 days could be resulting from the beat between 



the 37 and 27 periods, or the 76 and 365 days, which give respectively a 100 
and a 96 day beating period, - or both jointly. 

These periods have been used, by modeling the SOI as a quasi-oscillatory 
signal plus "noise" , to reconstruct the time series and extrapolate it for fore- 
casting purpose. The nature of the "noise" is also studied by removing from the 
SOI time series the reconstructed time series. Detrended fluctuation analysis 
together with power spectrum analysis have been employed in the attempt to 
understand the nature of the stochastic process showing that no simple scal- 
ing can be assessed. Through a systematic study of the distribution of first 
return times, we have indicated that extreme events cannot be considered 
uncorrelated. 

This paper has not primarily aimed at writing up a model nor interpreting 
the intrinsic modes, on one hand, - this is found in already published work, 
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but rather in pointing out that daily, quasi unaveraged data can be used for 
medium range and high frequency forecasting. 
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Table 1 

Values of the parameter used for forecasting with a superposition of quasi-oscillatory 
signals. 
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Fig. 1. Daily values of the Southern Oscillation Index (SOI) as defined 
in the text reported from 01 Jan. 1999 to 25 March 2007. Data are 
downloaded from http://www.longpaddock.qld. gov^^/jSeasonalClimateOutlook / 
SouthernOscillationlndex / S01DataFilesjindex.html. Data series consists of 
3006 data points. 
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Fig. 2. a) Adaptively weighted multitaper power spectrum of SOI daily data as 
function of frequency. Continuous lines are red noise confidence interval, b) same as 
before but plotted as function of time (days) to highlight the periodicities in days of 
the SOI intrinsic high frequency modes. Continuous lines are red noise confidence 
interval. 
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Fig. 3. a) Adaptively weighted multitaper power spectrum of SOI daily data as 
function of frequency. Continuous lines are locally white noise confidence interval, 
b) Same as before but plotted as function of time (days) to highlight the periodicities 
in days of the SOI intrinsic high frequency modes. Continuous lines are locally white 
noise confidence interval. 
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Fig. 4. Daily SOI signal and its reconstruction with basic periods as indicated in 
the text. 
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Fig. 5. Discrepancy between the adaptively weighted multitaper power spectrum 
of daily SOI (dashed line) and cleansed signal (solid line). The solid thicker lines 
represent the 90%, 95% and 99% confidence interval as explained in the text. 
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Fig. 6. Daily SOI (dashed line) together with the prediction (solid lines) based on 
the function described in the text. The upper solid and lower solid lines represent 
the 95% confidence interval estimated throught a cross validation procedure (see 
text for details). 
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Fig. 7. Receiver operating curve (ROC), -thick line compared to random estimates, 
-straight line. The shadowed area represent the ROC AUC (see text) 
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Fig. 8. DFA1 (triangles) and DFA2 (diamonds, amplitude divided by 10) of cleansed 
SOI compared with DFA1 (circles, amplitude multiplied by 10) of a Brownian mo- 
tion with the same data points (3006). The slope a for Brownian motion is 1.5. 
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Fig. 9. Log-log plot of the adaptively weighted multitaper power spectrum of 
cleansed SOI daily data as function of frequency. Gray shadow indicates 95% con- 
fidence intervals. 
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Fig. 10. Return interval distribution for events exceeding negative (diamonds), pos- 
itive (circles) and both positive and negative (triangles) threshold with q = 1 for 
cleansed SOI data (filled symbols) and shuffled cleansed SOI data (open symbols). 
The two type of distributions are fitted respectively with a stretched exponential 
(solid line) and a Poisson distribution (dashed line). The values for 7 are obtained 
through a least square fit. 
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